Dataset statistics
| Number of variables | 8 |
|---|---|
| Number of observations | 392 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 27.6 KiB |
| Average record size in memory | 72.0 B |
Variable types
| Numeric | 8 |
|---|
age is highly overall correlated with preg | High correlation |
insu is highly overall correlated with plas | High correlation |
mass is highly overall correlated with skin | High correlation |
plas is highly overall correlated with insu | High correlation |
preg is highly overall correlated with age | High correlation |
skin is highly overall correlated with mass | High correlation |
preg has 56 (14.3%) zeros | Zeros |
Reproduction
| Analysis started | 2024-03-30 00:18:08.155615 |
|---|---|
| Analysis finished | 2024-03-30 00:18:12.799168 |
| Duration | 4.64 seconds |
| Software version | ydata-profiling vv4.7.0 |
| Download configuration | config.json |
preg
Real number (ℝ)
HIGH CORRELATION  ZEROS 
| Distinct | 17 |
|---|---|
| Distinct (%) | 4.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.3010204 |
| Minimum | 0 |
|---|---|
| Maximum | 17 |
| Zeros | 56 |
| Zeros (%) | 14.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.1 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 2 |
| Q3 | 5 |
| 95-th percentile | 10 |
| Maximum | 17 |
| Range | 17 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 3.2114245 |
|---|---|
| Coefficient of variation (CV) | 0.9728581 |
| Kurtosis | 1.4863417 |
| Mean | 3.3010204 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 1.3355963 |
| Sum | 1294 |
| Variance | 10.313247 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=17)
| Value | Count | Frequency (%) |
| 1 | 93 | |
| 2 | 64 | |
| 0 | 56 | |
| 3 | 45 | |
| 4 | 27 | 6.9% |
| 5 | 21 | 5.4% |
| 7 | 20 | 5.1% |
| 6 | 19 | 4.8% |
| 8 | 14 | 3.6% |
| 9 | 11 | 2.8% |
| Other values (7) | 22 | 5.6% |
| Value | Count | Frequency (%) |
| 0 | 56 | |
| 1 | 93 | |
| 2 | 64 | |
| 3 | 45 | |
| 4 | 27 | 6.9% |
| 5 | 21 | 5.4% |
| 6 | 19 | 4.8% |
| 7 | 20 | 5.1% |
| 8 | 14 | 3.6% |
| 9 | 11 | 2.8% |
| Value | Count | Frequency (%) |
| 17 | 1 | 0.3% |
| 15 | 1 | 0.3% |
| 14 | 1 | 0.3% |
| 13 | 3 | 0.8% |
| 12 | 5 | 1.3% |
| 11 | 5 | 1.3% |
| 10 | 6 | 1.5% |
| 9 | 11 | |
| 8 | 14 | |
| 7 | 20 |
plas
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 117 |
|---|---|
| Distinct (%) | 29.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 122.62755 |
| Minimum | 56 |
|---|---|
| Maximum | 198 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.1 KiB |
Quantile statistics
| Minimum | 56 |
|---|---|
| 5-th percentile | 81 |
| Q1 | 99 |
| median | 119 |
| Q3 | 143 |
| 95-th percentile | 181 |
| Maximum | 198 |
| Range | 142 |
| Interquartile range (IQR) | 44 |
Descriptive statistics
| Standard deviation | 30.860781 |
|---|---|
| Coefficient of variation (CV) | 0.2516627 |
| Kurtosis | -0.48322696 |
| Mean | 122.62755 |
| Median Absolute Deviation (MAD) | 21 |
| Skewness | 0.51784994 |
| Sum | 48070 |
| Variance | 952.38778 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 100 | 14 | 3.6% |
| 99 | 10 | 2.6% |
| 129 | 9 | 2.3% |
| 95 | 8 | 2.0% |
| 88 | 8 | 2.0% |
| 126 | 7 | 1.8% |
| 117 | 7 | 1.8% |
| 128 | 7 | 1.8% |
| 109 | 7 | 1.8% |
| 112 | 6 | 1.5% |
| Other values (107) | 309 |
| Value | Count | Frequency (%) |
| 56 | 1 | 0.3% |
| 68 | 3 | |
| 71 | 2 | 0.5% |
| 74 | 3 | |
| 75 | 1 | 0.3% |
| 77 | 2 | 0.5% |
| 78 | 2 | 0.5% |
| 79 | 2 | 0.5% |
| 80 | 2 | 0.5% |
| 81 | 5 |
| Value | Count | Frequency (%) |
| 198 | 1 | 0.3% |
| 197 | 2 | |
| 196 | 2 | |
| 195 | 1 | 0.3% |
| 193 | 1 | 0.3% |
| 191 | 1 | 0.3% |
| 189 | 2 | |
| 188 | 1 | 0.3% |
| 187 | 4 | |
| 186 | 1 | 0.3% |
pres
Real number (ℝ)
| Distinct | 37 |
|---|---|
| Distinct (%) | 9.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 70.663265 |
| Minimum | 24 |
|---|---|
| Maximum | 110 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.1 KiB |
Quantile statistics
| Minimum | 24 |
|---|---|
| 5-th percentile | 50 |
| Q1 | 62 |
| median | 70 |
| Q3 | 78 |
| 95-th percentile | 90 |
| Maximum | 110 |
| Range | 86 |
| Interquartile range (IQR) | 16 |
Descriptive statistics
| Standard deviation | 12.496092 |
|---|---|
| Coefficient of variation (CV) | 0.17684 |
| Kurtosis | 0.79540444 |
| Mean | 70.663265 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | -0.087516392 |
| Sum | 27700 |
| Variance | 156.1523 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=37)
| Value | Count | Frequency (%) |
| 70 | 31 | 7.9% |
| 74 | 30 | 7.7% |
| 64 | 27 | 6.9% |
| 68 | 24 | 6.1% |
| 72 | 23 | 5.9% |
| 78 | 23 | 5.9% |
| 60 | 20 | 5.1% |
| 76 | 20 | 5.1% |
| 62 | 19 | 4.8% |
| 58 | 18 | 4.6% |
| Other values (27) | 157 |
| Value | Count | Frequency (%) |
| 24 | 1 | 0.3% |
| 30 | 2 | 0.5% |
| 38 | 1 | 0.3% |
| 40 | 1 | 0.3% |
| 44 | 3 | 0.8% |
| 46 | 2 | 0.5% |
| 48 | 3 | 0.8% |
| 50 | 10 | |
| 52 | 6 | |
| 54 | 8 |
| Value | Count | Frequency (%) |
| 110 | 2 | 0.5% |
| 106 | 2 | 0.5% |
| 102 | 1 | 0.3% |
| 100 | 2 | 0.5% |
| 98 | 1 | 0.3% |
| 94 | 2 | 0.5% |
| 92 | 1 | 0.3% |
| 90 | 11 | |
| 88 | 15 | |
| 86 | 11 |
skin
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 48 |
|---|---|
| Distinct (%) | 12.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 29.145408 |
| Minimum | 7 |
|---|---|
| Maximum | 63 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.1 KiB |
Quantile statistics
| Minimum | 7 |
|---|---|
| 5-th percentile | 13 |
| Q1 | 21 |
| median | 29 |
| Q3 | 37 |
| 95-th percentile | 46.45 |
| Maximum | 63 |
| Range | 56 |
| Interquartile range (IQR) | 16 |
Descriptive statistics
| Standard deviation | 10.516424 |
|---|---|
| Coefficient of variation (CV) | 0.3608261 |
| Kurtosis | -0.45769609 |
| Mean | 29.145408 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | 0.20931081 |
| Sum | 11425 |
| Variance | 110.59517 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=48)
| Value | Count | Frequency (%) |
| 32 | 20 | 5.1% |
| 30 | 18 | 4.6% |
| 33 | 17 | 4.3% |
| 23 | 16 | 4.1% |
| 18 | 16 | 4.1% |
| 27 | 14 | 3.6% |
| 26 | 14 | 3.6% |
| 29 | 14 | 3.6% |
| 28 | 13 | 3.3% |
| 25 | 12 | 3.1% |
| Other values (38) | 238 |
| Value | Count | Frequency (%) |
| 7 | 2 | 0.5% |
| 8 | 1 | 0.3% |
| 10 | 3 | 0.8% |
| 11 | 5 | |
| 12 | 6 | |
| 13 | 10 | |
| 14 | 6 | |
| 15 | 11 | |
| 16 | 5 | |
| 17 | 10 |
| Value | Count | Frequency (%) |
| 63 | 1 | 0.3% |
| 60 | 1 | 0.3% |
| 56 | 1 | 0.3% |
| 52 | 2 | 0.5% |
| 51 | 1 | 0.3% |
| 50 | 3 | |
| 49 | 3 | |
| 48 | 4 | |
| 47 | 4 | |
| 46 | 7 |
insu
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 184 |
|---|---|
| Distinct (%) | 46.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 156.05612 |
| Minimum | 14 |
|---|---|
| Maximum | 846 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.1 KiB |
Quantile statistics
| Minimum | 14 |
|---|---|
| 5-th percentile | 42.55 |
| Q1 | 76.75 |
| median | 125.5 |
| Q3 | 190 |
| 95-th percentile | 396.5 |
| Maximum | 846 |
| Range | 832 |
| Interquartile range (IQR) | 113.25 |
Descriptive statistics
| Standard deviation | 118.84169 |
|---|---|
| Coefficient of variation (CV) | 0.76153174 |
| Kurtosis | 6.3565051 |
| Mean | 156.05612 |
| Median Absolute Deviation (MAD) | 54.5 |
| Skewness | 2.1651162 |
| Sum | 61174 |
| Variance | 14123.347 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 105 | 11 | 2.8% |
| 130 | 9 | 2.3% |
| 140 | 9 | 2.3% |
| 120 | 8 | 2.0% |
| 94 | 7 | 1.8% |
| 180 | 7 | 1.8% |
| 100 | 7 | 1.8% |
| 135 | 6 | 1.5% |
| 115 | 6 | 1.5% |
| 110 | 6 | 1.5% |
| Other values (174) | 316 |
| Value | Count | Frequency (%) |
| 14 | 1 | 0.3% |
| 15 | 1 | 0.3% |
| 16 | 1 | 0.3% |
| 18 | 2 | |
| 22 | 1 | 0.3% |
| 23 | 1 | 0.3% |
| 25 | 1 | 0.3% |
| 29 | 1 | 0.3% |
| 32 | 1 | 0.3% |
| 36 | 3 |
| Value | Count | Frequency (%) |
| 846 | 1 | |
| 744 | 1 | |
| 680 | 1 | |
| 600 | 1 | |
| 579 | 1 | |
| 545 | 1 | |
| 543 | 1 | |
| 540 | 1 | |
| 510 | 1 | |
| 495 | 2 |
mass
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 194 |
|---|---|
| Distinct (%) | 49.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 33.086224 |
| Minimum | 18.2 |
|---|---|
| Maximum | 67.1 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.1 KiB |
Quantile statistics
| Minimum | 18.2 |
|---|---|
| 5-th percentile | 22.255 |
| Q1 | 28.4 |
| median | 33.2 |
| Q3 | 37.1 |
| 95-th percentile | 45.245 |
| Maximum | 67.1 |
| Range | 48.9 |
| Interquartile range (IQR) | 8.7 |
Descriptive statistics
| Standard deviation | 7.0276592 |
|---|---|
| Coefficient of variation (CV) | 0.21240439 |
| Kurtosis | 1.5565131 |
| Mean | 33.086224 |
| Median Absolute Deviation (MAD) | 4.5 |
| Skewness | 0.66348506 |
| Sum | 12969.8 |
| Variance | 49.387994 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 33.3 | 7 | 1.8% |
| 32 | 7 | 1.8% |
| 31.6 | 6 | 1.5% |
| 33.6 | 5 | 1.3% |
| 35.5 | 5 | 1.3% |
| 25.2 | 5 | 1.3% |
| 28.7 | 5 | 1.3% |
| 30.8 | 5 | 1.3% |
| 33.2 | 5 | 1.3% |
| 39.4 | 5 | 1.3% |
| Other values (184) | 337 |
| Value | Count | Frequency (%) |
| 18.2 | 1 | |
| 19.3 | 1 | |
| 19.4 | 1 | |
| 19.5 | 2 | |
| 19.6 | 2 | |
| 20.1 | 1 | |
| 20.4 | 2 | |
| 20.8 | 2 | |
| 21.1 | 1 | |
| 21.2 | 1 |
| Value | Count | Frequency (%) |
| 67.1 | 1 | |
| 59.4 | 1 | |
| 57.3 | 1 | |
| 55 | 1 | |
| 53.2 | 1 | |
| 52.3 | 1 | |
| 49.7 | 1 | |
| 47.9 | 1 | |
| 46.8 | 1 | |
| 46.7 | 1 |
pedi
Real number (ℝ)
| Distinct | 331 |
|---|---|
| Distinct (%) | 84.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.52304592 |
| Minimum | 0.085 |
|---|---|
| Maximum | 2.42 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.1 KiB |
Quantile statistics
| Minimum | 0.085 |
|---|---|
| 5-th percentile | 0.15355 |
| Q1 | 0.26975 |
| median | 0.4495 |
| Q3 | 0.687 |
| 95-th percentile | 1.16035 |
| Maximum | 2.42 |
| Range | 2.335 |
| Interquartile range (IQR) | 0.41725 |
Descriptive statistics
| Standard deviation | 0.34548804 |
|---|---|
| Coefficient of variation (CV) | 0.660531 |
| Kurtosis | 6.3666899 |
| Mean | 0.52304592 |
| Median Absolute Deviation (MAD) | 0.192 |
| Skewness | 1.9591012 |
| Sum | 205.034 |
| Variance | 0.11936199 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.692 | 4 | 1.0% |
| 0.452 | 3 | 0.8% |
| 0.687 | 3 | 0.8% |
| 0.422 | 3 | 0.8% |
| 0.299 | 3 | 0.8% |
| 0.26 | 3 | 0.8% |
| 0.496 | 3 | 0.8% |
| 0.261 | 3 | 0.8% |
| 0.637 | 2 | 0.5% |
| 0.678 | 2 | 0.5% |
| Other values (321) | 363 |
| Value | Count | Frequency (%) |
| 0.085 | 1 | |
| 0.088 | 1 | |
| 0.089 | 1 | |
| 0.101 | 1 | |
| 0.107 | 1 | |
| 0.115 | 1 | |
| 0.118 | 1 | |
| 0.122 | 1 | |
| 0.123 | 1 | |
| 0.127 | 1 |
| Value | Count | Frequency (%) |
| 2.42 | 1 | |
| 2.329 | 1 | |
| 2.288 | 1 | |
| 2.137 | 1 | |
| 1.699 | 1 | |
| 1.6 | 1 | |
| 1.4 | 1 | |
| 1.391 | 1 | |
| 1.39 | 1 | |
| 1.353 | 1 |
age
Real number (ℝ)
HIGH CORRELATION 
| Distinct | 43 |
|---|---|
| Distinct (%) | 11.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 30.864796 |
| Minimum | 21 |
|---|---|
| Maximum | 81 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 6.1 KiB |
Quantile statistics
| Minimum | 21 |
|---|---|
| 5-th percentile | 21 |
| Q1 | 23 |
| median | 27 |
| Q3 | 36 |
| 95-th percentile | 52.45 |
| Maximum | 81 |
| Range | 60 |
| Interquartile range (IQR) | 13 |
Descriptive statistics
| Standard deviation | 10.200777 |
|---|---|
| Coefficient of variation (CV) | 0.33049875 |
| Kurtosis | 1.7375308 |
| Mean | 30.864796 |
| Median Absolute Deviation (MAD) | 5 |
| Skewness | 1.4036065 |
| Sum | 12099 |
| Variance | 104.05584 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=43)
| Value | Count | Frequency (%) |
| 22 | 43 | 11.0% |
| 21 | 33 | 8.4% |
| 24 | 31 | 7.9% |
| 25 | 30 | 7.7% |
| 23 | 28 | 7.1% |
| 26 | 24 | 6.1% |
| 28 | 21 | 5.4% |
| 29 | 14 | 3.6% |
| 27 | 14 | 3.6% |
| 31 | 12 | 3.1% |
| Other values (33) | 142 |
| Value | Count | Frequency (%) |
| 21 | 33 | |
| 22 | 43 | |
| 23 | 28 | |
| 24 | 31 | |
| 25 | 30 | |
| 26 | 24 | |
| 27 | 14 | 3.6% |
| 28 | 21 | |
| 29 | 14 | 3.6% |
| 30 | 10 | 2.6% |
| Value | Count | Frequency (%) |
| 81 | 1 | 0.3% |
| 63 | 1 | 0.3% |
| 61 | 1 | 0.3% |
| 60 | 2 | |
| 59 | 1 | 0.3% |
| 58 | 4 | |
| 57 | 2 | |
| 56 | 1 | 0.3% |
| 55 | 2 | |
| 54 | 2 |
| age | insu | mass | pedi | plas | preg | pres | skin | |
|---|---|---|---|---|---|---|---|---|
| age | 1.000 | 0.261 | 0.167 | 0.103 | 0.350 | 0.634 | 0.329 | 0.242 |
| insu | 0.261 | 1.000 | 0.301 | 0.132 | 0.659 | 0.123 | 0.132 | 0.241 |
| mass | 0.167 | 0.301 | 1.000 | 0.096 | 0.199 | -0.066 | 0.317 | 0.674 |
| pedi | 0.103 | 0.132 | 0.096 | 1.000 | 0.089 | 0.012 | -0.021 | 0.093 |
| plas | 0.350 | 0.659 | 0.199 | 0.089 | 1.000 | 0.190 | 0.237 | 0.216 |
| preg | 0.634 | 0.123 | -0.066 | 0.012 | 0.190 | 1.000 | 0.152 | 0.055 |
| pres | 0.329 | 0.132 | 0.317 | -0.021 | 0.237 | 0.152 | 1.000 | 0.250 |
| skin | 0.242 | 0.241 | 0.674 | 0.093 | 0.216 | 0.055 | 0.250 | 1.000 |
A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
| preg | plas | pres | skin | insu | mass | pedi | age | |
|---|---|---|---|---|---|---|---|---|
| 3 | 1 | 89 | 66 | 23 | 94 | 28.1 | 0.167 | 21 |
| 4 | 0 | 137 | 40 | 35 | 168 | 43.1 | 2.288 | 33 |
| 6 | 3 | 78 | 50 | 32 | 88 | 31.0 | 0.248 | 26 |
| 8 | 2 | 197 | 70 | 45 | 543 | 30.5 | 0.158 | 53 |
| 13 | 1 | 189 | 60 | 23 | 846 | 30.1 | 0.398 | 59 |
| 14 | 5 | 166 | 72 | 19 | 175 | 25.8 | 0.587 | 51 |
| 16 | 0 | 118 | 84 | 47 | 230 | 45.8 | 0.551 | 31 |
| 18 | 1 | 103 | 30 | 38 | 83 | 43.3 | 0.183 | 33 |
| 19 | 1 | 115 | 70 | 30 | 96 | 34.6 | 0.529 | 32 |
| 20 | 3 | 126 | 88 | 41 | 235 | 39.3 | 0.704 | 27 |
| preg | plas | pres | skin | insu | mass | pedi | age | |
|---|---|---|---|---|---|---|---|---|
| 744 | 13 | 153 | 88 | 37 | 140 | 40.6 | 1.174 | 39 |
| 745 | 12 | 100 | 84 | 33 | 105 | 30.0 | 0.488 | 46 |
| 747 | 1 | 81 | 74 | 41 | 57 | 46.3 | 1.096 | 32 |
| 748 | 3 | 187 | 70 | 22 | 200 | 36.4 | 0.408 | 36 |
| 751 | 1 | 121 | 78 | 39 | 74 | 39.0 | 0.261 | 28 |
| 753 | 0 | 181 | 88 | 44 | 510 | 43.3 | 0.222 | 26 |
| 755 | 1 | 128 | 88 | 39 | 110 | 36.5 | 1.057 | 37 |
| 760 | 2 | 88 | 58 | 26 | 16 | 28.4 | 0.766 | 22 |
| 763 | 10 | 101 | 76 | 48 | 180 | 32.9 | 0.171 | 63 |
| 765 | 5 | 121 | 72 | 23 | 112 | 26.2 | 0.245 | 30 |